) Proceedings of the 1 st International Workshop on Knowledge Discovery on the WEB

نویسندگان

  • Giuliano Armano
  • Alessandro Bozzon
  • Alessandro Giuliani
چکیده

In this paper, the goal is harvesting all documents matching a given (entity) query from a deep web source. The objective is to retrieve all information about for instance “Denzel Washington”, “Iran Nuclear Deal”, or “FC Barcelona” from data hidden behind web forms. Policies of web search engines usually do not allow accessing all of the matching query search results for a given query. They limit the number of returned documents and the number of user requests. In this work, we propose a new approach which automatically collects information related to a given query from a search engine, given the search engine’s limitations. The approach minimizes the number of queries that need to be sent by applying information from a large external corpus. The new approach outperforms existing approaches when tested on Google, measuring the total number of unique documents found per query.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Expert Discovery: A web mining approach

Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...

متن کامل

The Effect of Workshop and Multimedia Training Methods on Nurses’ Knowledge and Performance on Blood Transfusion

Background: Blood transfusion faults and its consequences are major concerns of health care systems. This study aimed to determine the effects of workshop and multimedia training methods on nurses’ knowledge and performance about blood transfusion. Methods: It was a controlled quasi-experimental study. Sampling was conducted. Data were collected from 37 participants in three hospit...

متن کامل

Classroom-Oriented Higher Education System or Workshop-Oriented Higher Education System (Based on Cost & Economic Approach)

The most important goal of each society, is to reach economic development. As the goal and agent of development, man has got an important responsibility, which responsibility is realized by way of education, specially higher education, because the universities are the main factors for progress, production of knowledge and education of specialized human forces and they play a significant role in...

متن کامل

Proceedings of the 2 nd International Workshop on Exploiting Large Knowledge Repositories and the 1 st International Workshop on Automatic Text

Knowledge based applications require linguistic, terminological and ontological resources. These applications are used to fulfill a set of tasks such as semantic indexing, knowledge extraction from text, information retrieval, etc. Using these resources and combining them for the same application is a tedious task with different levels of complexity. This requires their representation in a comm...

متن کامل

Exploring Relevance as Truth Criterion on the Web and Classifying Claims in Belief Levels

The Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this paper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015